6 research outputs found

    A Recursive Algebraic Coloring Technique for Hardware-Efficient Symmetric Sparse Matrix-Vector Multiplication

    Get PDF
    The symmetric sparse matrix-vector multiplication (SymmSpMV) is an important building block for many numerical linear algebra kernel operations or graph traversal applications. Parallelizing SymmSpMV on today's multicore platforms with up to 100 cores is difficult due to the need to manage conflicting updates on the result vector. Coloring approaches can be used to solve this problem without data duplication, but existing coloring algorithms do not take load balancing and deep memory hierarchies into account, hampering scalability and full-chip performance. In this work, we propose the recursive algebraic coloring engine (RACE), a novel coloring algorithm and open-source library implementation, which eliminates the shortcomings of previous coloring methods in terms of hardware efficiency and parallelization overhead. We describe the level construction, distance-k coloring, and load balancing steps in RACE, use it to parallelize SymmSpMV, and compare its performance on 31 sparse matrices with other state-of-the-art coloring techniques and Intel MKL on two modern multicore processors. RACE outperforms all other approaches substantially and behaves in accordance with the Roofline model. Outliers are discussed and analyzed in detail. While we focus on SymmSpMV in this paper, our algorithm and software is applicable to any sparse matrix operation with data dependencies that can be resolved by distance-k coloring

    Implementation and Performance Engineering of the Kaczmarz Method for Parallel Systems

    Get PDF
    The Kaczmarz method is a simple and robust iterative solver for linear systems of equations. It is used in different fields of science and engineering ranging from medical imaging to solving convection dominated flows, Helmholtz equations and eigenvalue problems. In this thesis we investigate hardware-efficiency and scalable shared memory parallelization strategies for the Kaczmarz method when used as a solver for sparse linear systems. The inherent data dependencies of this method hinder fine-grained parallelism like SIMD or multi-threading to be used efficiently. However, there exist techniques like multicoloring which can enable this level of parallelism. A critical analysis of the multicoloring approach both in terms of performance and qualitative behavior reveals its deficiencies on modern compute platforms. Starting with existing ideas, this thesis proposes a novel "block multicoloring" method, which leverages structural features of (partly) bandor hull-structured matrices. A thorough node-level performance analysis demonstrates that this approach outperforms traditional multicoloring significantly (up to 3x on a single compute node) for a selection of relevant application matrices and never falls behind it even for malicious cases. Finally, our Kaczmarz implementation combined with block multicoloring is used as a linear solver in the FEAST method, to compute inner eigenvalues of large sparse matrices. These first results demonstrate the applicability of the presented approach and indicate its superiority for large scale computations as compared to direct solvers which are state-of-the art for FEAST method

    A Recursive Algebraic Coloring Technique for Hardware-Efficient Symmetric Sparse Matrix-Vector Multiplication

    Get PDF
    The symmetric sparse matrix-vector multiplication (SymmSpMV) is an important building block for many numerical linear algebra kernel operations or graph traversal applications. Parallelizing SymmSpMV on today's multicore platforms with up to 100 cores is difficult due to the need to manage conflicting updates on the result vector. Coloring approaches can be used to solve this problem without data duplication, but existing coloring algorithms do not take load balancing and deep memory hierarchies into account, hampering scalability and full-chip performance. In this work, we propose the recursive algebraic coloring engine (RACE), a novel coloring algorithm and open-source library implementation, which eliminates the shortcomings of previous coloring methods in terms of hardware efficiency and parallelization overhead. We describe the level construction, distance-k coloring, and load balancing steps in RACE, use it to parallelize SymmSpMV, and compare its performance on 31 sparse matrices with other state-of-the-art coloring techniques and Intel MKL on two modern multicore processors. RACE outperforms all other approaches substantially and behaves in accordance with the Roofline model. Outliers are discussed and analyzed in detail. While we focus on SymmSpMV in this paper, our algorithm and software is applicable to any sparse matrix operation with data dependencies that can be resolved by distance-k coloring
    corecore